The data science industry has seen huge developments in recent times. Data is being used for various purposes. A very important part of Data science is data analysis and visualisation. Exploratory Data analysis and Data Visualisation can lead to very useful insights. Only by analyzing data and visualising it, we can use it efficiently for any purpose. The aim of this assignment is to do analysis and visualisation of the policing dataset from Dallas, Texas in 2016. It is necessary to interpret the findings and results.
The data set is the policing dataset from Dallas, Texas in 2016.
It contains 2271 rows and 50 columns. It contains data about incidents that happened in the area. The features mainly describe, the police officer that was involved, the subject involved in the incident, the place where the incident happened and what the outcome of the incident was, like did arrest happen, did the officer have to use force,etc.
We can see that most of the officers are male, with the number of female officers less than 25% of the male officers.
As seen from the graph, White officers constitute the major portion of the officers, followed by Hispanic, Black and Asian officers.
The bargraph above indicates the number of male and female officers
in each race. We can see that in all races the number of female officers
is less than male officers.
The two bargraphs indicate the number of times the officer got
injured with respect to race and gender respectively. We can see that in
the case of all races, the officer got injured lesser number of times.
Same is the case with both genders.
From the bargraph above, we can see that like the case with officers,
the number of female subjects is less than the number of male subjects.
From this graph, it is clear that most of the subjects belonged to
the Black race, followed by Hispanic and then White. American Ind and
Asian subjects are very less.
We can see that in Black, Hispanic and White races, the number of
male subjects is far more than female subjects. American Ind and Asian
races have only male subjects in this dataset.
This graph shows the proportion of the involved officers race in each
of the subject races. In the case of Black, Hispanic and White races,
most of the involved officers were from White race.
This graph shows the proportion of the involved officers gender in
the case of both male and female subjects. In the case of both male and
female subjects, most of the involved officers were male.
The two bargraphs indicate the number of times the subject got
injured with respect to race and gender respectively. We can see that in
the case of all races, the subject got injured lesser number of times.
Same is the case with both genders.
The two bargraphs indicate the number of times the subject got
arrested with respect to race and gender respectively. We can see that
in the case of all races, the subject got arrested more number of times.
Same is the case with both genders.
The bargraph above show the incident count at different times during the day, namely morning, afternoon and night. Surprisingly no incident happened during evening so we do not have a bar for evening. It is clearly seen that, most incidents occurred in the morning, followed by night. The least number of incidents occurred in the afternoon.
This graph shows the incident count in different months of the year. It can be clearly seen that most incidents happened in the month of March and the least number of incidents happened in December.
This graph shows the incident count with respect to different reasons
for the incident. ‘Arrest’ was the most common reason and ‘Accidental
Discharge’ was the least common reason.
This graph indicates the number of incidents where the force used was effective. If multiple forces were used, the result of the most recent force was considered. In most cases, the used force was effective. In lesser number of cases, there was ineffectiveness for the force and in even lesser number of cases, there was limited effectiveness.
The above density plot show the density distribution of the number of
years on the force for the officers. It can be seen that most of the
officers have less than 5 years experience. There are very less number
of officers with above 30 years experience. The graph rises slightly at
approximately 7 years and 27 years indicating slightly larger number of
officers at this point.
The above plots are box plots of the years on force based on gender.
A boxplot is used to locate outliers, which are located outside the
whiskers of the boxplot. It also displays summary statistics when
hovered over. Here we can see outliers in the case of both male and
female officers.
The above plots are box plots of the years on force based on race. Here we can see outliers in the case of officers who belong to the races Black, Hispanic and White. There are no outliers in the case of other races. This is beacause there are very less officers from other races.
The above plots are violin plots of the years on force based on
gender. Violin plots are used to visualize the distribution of numeric
data and the density of each variable. Here we can see similar
distribution for male and female officers across different years on
force, except for maybe just above 25 years where the density of male
officers is slightly more.
The above plots are violin plots for years on force based on officer
race. It is seen that officers of the White and Hispanic race have
similar distribution while other races have different distributions.
A map showing the places where the incident happened for male and female subjects:
The map shows that both male and female subjects are distributed
across the area.
A map showing the places where the
incident happened for subjects of different races:
This map shows that the subjects of different races are distributed
across the area.
A map showing the places where the incident happened and the subject was arrested or not:
This map shows that the subjects whether arrested or not are distributed across the area.
The data required some preprocessing and some new features were introduced. Missing values in the data were handled. Various plots were done on the data. Bargraphs, stacked bargraphs, multiple bargraphs for the same discrete value, density plots, violin plots, box plots and maps were done. Some interesting observations were obtained from these plots. The number of females is less than males in the case of officers and subjects. Officers from the White race are the highest in number. In the case of subjects, the largest number was from Black race. The officers and subjects from American Ind and Asian races are less in number when compared to other races. It was also observed that the chances of either the officer or the subject getting injured was less.Although the chance of the subject getting arrested is high. Most incidents happened in the morning and less in the afternoon. Most incidents happened in the month of March and the least in December. The officers with above 30 years experience are very less in number. Most officers have less than 5 years experience. The experience feature is evenly distributed among male and female officers. Same is the case with officers from the White and Hispanic race.